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Abstract 

Background: Polyphenol oxidase (PPO) activity in plants is a trait with potential economic, 
agricultural and environmental impact. In relation to the food industry, PPO-induced browning 
causes unacceptable discolouration in fruit and vegetables: from an agriculture perspective, PPO 
can protect plants against pathogens and environmental stress, improve ruminant growth by 
increasing nitrogen absorption and decreasing nitrogen loss to the environment through the 
animal's urine. The high PPO legume, red clover, has a significant economic and environmental role 
in sustaining low-input organic and conventional farms. Molecular markers for a range of important 
agricultural traits are being developed for red clover and improved knowledge of PPO genes and 
their structure will facilitate molecular breeding. 

Results: A bacterial artificial chromosome (BAC) library comprising 26,016 BAC clones with an 
average 1 35 Kb insert size, was constructed from Trifolium pratense L. (red clover), a diploid legume 
with a haploid genome size of 440-637 Mb. Library coverage of 6-8 genome equivalents ensured 
good representation of genes: the library was screened for polyphenol oxidase (PPO) genes. 

Two single copy PPO genes, PP04 and PP05, were identified to add to a family of three, previously 
reported, paralogous genes (PPO I-PP03). Multiple PPO I copies were identified and characterised 
revealing a subfamily comprising three variants PPO 1/2, PPO 1/4 and PPO 1/5. Six PPO genes 
clustered within the genome: four separate BAC clones could be assembled onto a predicted 1 90- 
5 1 0 Kb single BAC contig. 

Conclusion: A PPO gene family in red clover resides as a cluster of at least 6 genes. Three of these 
genes have high homology, suggesting a more recent evolutionary event. This PPO cluster covers 
a longer region of the genome than clusters detected in rice or previously reported in tomato. Full- 
length coding sequences from PP04, PP05, PPO 1/5 and PPO 1/4 will facilitate functional studies 
and provide genetic markers for plant breeding. 
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Background 

Polyphenol oxidases (PPOs) are implicated in a range of 
biological functions in diverse systems. In addition to a 
role in black/brown pigment biosynthesis, PPOs may also 
have protective roles in plants against pathogens and envi- 
ronmental stress. While PPO-induced browning is a 
major problem in the food industry, causing massive 
losses through unacceptable discolouration in fruit and 
vegetables [1,2], it is also implicated in plant defence 
against bacterial and fungal diseases of diverse plant spe- 
cies [3-7]. Down-regulating constitutive and induced 
expression of PPOs in tomato by antisense methods 
resulted in increased pathogen susceptibility [7]. In the 
forage legume Trifolium pratense L. (red clover), PPO activ- 
ity also provides some protection against natural infesta- 
tions of sciarid fly, thrips and aphids under semi- 
controlled conditions [8]. 

PPO activity in red clover is an agriculturally and environ- 
mentally important trait. Red clover provides a significant 
and sustainable component of grazed pastures in low- 
input organic and conventional farms and is harvested for 
conservation as hay or silage in Europe and North Amer- 
ica [9]. Major nutritional benefits of PPO activity have 
been recognised in this crop; high levels of PPO activity 
confer protection against protein degradation by micro- 
organisms in the animal rumen [10,11] and by plant 
enzymes during ensilage [12,13]. Lower protein degrada- 
tion in the rumen and during ensiling results in increased 
nitrogen absorption by ruminants and simultaneously 
decreases nitrogen loss to the environment through the 
animal's urine. 

PPO enzymes are ubiquitous and found in a broad range 
of dicotyledonous and monocotyledonous species. In leg- 
umes only a latent form of PPO enzyme was reported in 
leaves of the grain legume, Viciafaba [14], but active PPO 
enzymes are constitutively expressed in both aerial and 
root tissues in T. pratense. Thus, T. pratense offers an ideal 
opportunity to study a PPO gene family and aspects of 
PPO function. Complete coding sequences, but not pro- 
moter regions, of PPO genes PPOl, PP02 and PP03, have 
previously been reported [15]. Expression patterns of the 
three known PPO genes vary in red clover: PPO 1 is most 
abundant in young leaves, PP02 in flowers and petioles, 
and PP03 in leaves and also possibly in flowers [15]. In 
tomato (Lycopersicon esculentum Mill.), expression profiles 
of a six-member PPO gene family (PPOs A/ A', B, C, D, E 
and F) revealed differential PPO expression [7,16]. PPO B 
is highly expressed in young tomato leaves, whereas tran- 
scripts of PPO B, E and F dominate in the inflorescence. 
Specific PPO transcripts are also associated with different 
trichome types. 

The tomato PPO gene family has six paralogous genes, 
which all appear to be clustered on a 165 Kb region on 



chromosome 8 [17]. The genomic relationship between 
members of the T. pratense PPO gene family is unknown, 
but similarities in gene structure and function, combined 
with differences in individual PPO gene expression pro- 
files in red clover [15], suggest that these red clover PPO 
genes are also paralogues. Such gene duplication, fol- 
lowed by divergence from the parent sequence by muta- 
tion and selection or drift, is believed to provide a 
platform for evolutionary change within genomes [18]. 

The haploid genome size of T. pratense has previously 
been estimated as 637 Mb when measured by microden- 
sitometry of Feulgen-stained nuclei [19] and, more 
recently, as 440 Mb when measured by flow cytometry 
[20]. Two red clover libraries already exist [20] but they 
have relatively small insert sizes. Here, we describe the cre- 
ation of a new T. pratense BAC library with a larger insert 
size and its use in isolating additional PPO genes and their 
regulatory regions and in determining the relationship 
between PPO gene family members within the T. pratense 
genome. 

Results 

BAC library construction and validation 

The T. pratense BAC library was constructed from partially 
digested gDNA in a single, high molecular weight, size 
selection experiment. A total of 26,016 BACs were picked 
into 271 96-well plates, with an estimated average insert 
size of 135 Kb per BAC clone, based on 58 randomly 
selected BAC clones (Figure 1, 2). 

PCR-based screen of BAC library and PPO sequence 
analysis 

The primer pairs specific to PP02, PP04 and PP05 iden- 
tified 5-6 BACs each, indicating one copy of each gene. By 
contrast, the PPOl primer pair identified at least 28 BAC 
clones (Table 1). All PPO genes were sequenced directly 
from selected BAC clones. An iterative process of sequenc- 
ing and primer design revealed a subfamily of PPO 1 . 

Three variants PPO 1/2, PPO 1/4 and PPO 1/5 could be 
clearly distinguished based on their coding regions (Fig- 
ure 3) and were further distinguished by differences in 
their flanking sequences. Primer pairs specific to variants 
PPO 1/2 and PPO 1/5 initially identified four and nine 
BAC clones, respectively (Table 1). In contrast, at least 26 
BAC clones with PPO 1/4 were identified from the PCR- 
based screen of the BAC library (Table 1). 

Sequencing confirmed the presence of PPO 1/2 on two 
BAC clones and PPO 1/5 on four BAC clones. Five of the 
26 BAC clones harbouring PPO 1/4 were analysed further. 
Three of the five BACs also harboured other PPO genes, 
while the remaining two contained PPO 1 /4 alone; BAC- 
end sequencing showed homology regions with fully 



Page 2 of 1 1 

(page number not for citation purposes) 



BMC Plant Biology 2009, 9:94 



http://www.biomedcentral.eom/1 471 -2229/9/94 




Figure I 

T. pratense inserts released by digestion from 58 ran- 
domly selected BAC clones. Using Not I , DNA was sepa- 
rated by pulse-field gel electrophoresis (PFGE). BACs were 
generated by restricting T. pratense gDNA with Hindi 1 1, PFGE 
and cloning the size separated gDNA in the size region of 
1 50- 1 00 Kb. Molecular weight standards are lane I, lambda 
ladder (NEB, Beverley, Mass., USA) and lane 2, DNA Molecu- 
lar Weight Marker X (Roche); plndigoBAC5 Not\ vector frag- 
ment is 7 Kb. The average insert size calculated from all I I 
BAC clones in lanes 3- 1 3 is estimated as I 1 3 Kb. 



sequenced BAC 212 G7, indicating that the solitary PPOl/ 
4 gene resided within this larger BAC clone. 

Further sequence analysis of PPOl/5 revealed that one of 
the four BAC clones contained a 100 bp deletion in 1.7 Kb 
of 3' non-coding flanking region; otherwise there was 
>99.5% identity in both PPO coding and flanking 




0-50 51-100 101-150 151-200 201-250 251-300 



BAC clone size (Kb) 



Figure 2 

Distribution of DNA insert size of 58 T. pratense BAC 
clones. Insert sizes in Kb were calculated from Not I digests 
of BAC DNA following fractionation by pulse-field gel elec- 
trophoresis. The average insert size of the library was esti- 
mated at 1 35 Kb. 



sequences, differing only in six separate, single bases. 
PPO 1/5 has the highest homology (99%) with the previ- 
ously reported PPOl [15]. 

Sequence analysis of PP04 and PP05 

Full length coding DNA sequences of PP04 [GenBank: 
EF183483.1 ] and PP05 [GenBank: EF183484.1 ] were 
deduced from BAC sequences; neither gene contained 
introns. PP04 and PP05 sequences encode predicted pro- 
teins comprising 604 and 605 amino acids with molecu- 
lar weights of 68.4 and 68.6 kDa, respectively. Identity 
between PPOl, PP02, PP03, PP04 and PP05 genes at 
the cDNA and amino acid sequence levels are 84-94% 
and 70-88%, respectively, with PP03 and PP05 showing 
highest homology (Figure 4). Flanking DNA sequences 
show little homology, indicating that the PPO genes are in 
different positions on the genome and therefore verify 
their separate identities (Table 1). 

PPO gene clusters 

Some BAC clones contained more than one PPO gene and 
this information was used to create a map of a predicted 
PPO cluster (Figure 5). For example, out of five separate 
BAC clones containing PPOl, one contained PPO 1/5 
alone (BAC 52 A5), a second contained PP02, PPOl/2 
and PPO 1/5 (BAC 98 Al), a third contained PPO 1/2, 
PPO 1/5 and PP05 (BAC 32 D7), a fourth contained 
PPO 1/4, PPOl/5 and PP05 (BAC 212 G7), and a fifth 
contained PPOl/4 and PP04 (BAC 205 F12). Analysis of 
four of these BAC clones containing 11 identified PPO 
genes provided evidence of a potential cluster of six dis- 
tinct PPO genes within 190-510 Kb (Figure 5). The full 
sequence of BAC 212 G7 confirmed the presence of three 
PPO genes (PPOl/5, PP05 and PPO 1/4) and no other 
plant genes; however, retrotransposons were detected. The 
minimum PPO cluster length is based on 156,267 bp of 
sequence from BAC clone 212 G7 plus sequence from 
PP02, PPO 1/2 and PP04 genes and their flanking regions 
and a calculation of sequence overlap between BAC 
clones 205 F12 and 32 D7 with 212 G7. 

Alignment of sequenced BAC 212 G7 and BAC 52 A5, 
containing the single copy of PPOl/5, revealed about 1.5 
Kb identical flanking sequences; in addition, M13 (-20) 
derived BAC-end sequence of BAC 52 A5 was contained 
within BAC 212 G7, indicating that this PPO gene also lies 
within the proposed gene cluster. 

PPO 3 has not been identified in this red clover BAC 
library. However, both PP03 and PP05 have been 
detected by sequencing PCR products of individual plants 
from cultivars Sabtoron, Britta and Milvus, including the 
genotype used to generate the BAC library, using diagnos- 
tic primers. Coding regions of PP03 and PP05 differ 
(88% amino acids and 94% DNA; Figure 4), but show 
98% homology over 171 bp of 3' flanking region. 
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Table I: Number of estimated BAC clones, confirmed sequences and predicted copy number of members of the PPO gene family 
identified in a 7. pratense BAC library 



Gene 


PPO variant 


Estimated no. BAC clones containing PPO 


Confirmed no. sequences from BAC clones Predicted PPO copy no. 


PPO I (total) 




>28 


I I 3-5 




PPOI/2 


4 


2 I 




PPOI/4 


>26 a 


5 I 




PPOI/5 


9 


4 I-2 


PP02 




5 


I I 


PP04 




6 


I I 


PP05 




5 


2 I 



a 20/47 PCR products from gDNA superpools of BAC library were sequenced and confirmed as PPO I/4. 



A search of the GenBank database revealed that rice has 
two PPO genes in tandem on a 29,943 bp sequence [Gen- 
Bank: APQQ821Q ] (Figure 6), with at least one of these rice 
PPO genes being expressed [GenBank: 
NM 001060467.1 ]. In Medicago truncatula [GenBank: 
AC157507.2 ] there are two PPOs, which differ by 1 1%, on 
an 8 Kb genomic sequence, but no equivalent ESTs have 
yet been deposited in the databases. 

Relationship of DNA sequences of PPO 

A phylogenetic analysis of DNA coding sequences con- 
firmed sequence similarities within species, and showed 
differences between PPO sequences from Solanaceous 
and leguminous species (Figure 7; p < 0.01). Bootstrap- 
ping exercises were applied to the datasets to measure how 
consistently the data support given taxon bipartitions. All 
the tree branches support values generated in this study 
have high support values (>50%) and therefore provide 
uniform support. 

Sequences from different PPO genes of the Solanaceous 
species, Solanum tuberosum and Lycopersicon esculentum 
(Solarium lycopersicon), showed a high level of similarity 
between, as well as within, species (Figure 7). Within the 
legumes, PPO sequence from Medicago sativa was more 
similar to the two M. truncatula and Vicia faba sequences 
than to the seven T. pratense sequences. In T. pratense 
PPO 1/2, PPO 1/4 and PPO 1/5 exhibited the highest simi- 
larity, followed by PP03 and PP05 (Figure 7). 

Discussion 

Characteristics of BAC library 

The genome size of T. pratense was previously estimated as 
440 Mb [20] and 637 Mb [19]. The average BAC insert size 
was estimated as 135 Kb therefore, the predicted genome 
coverage of the library was 6-8 x. This library comple- 



ments two existing red clover libraries with smaller aver- 
age insert sizes at 80 and 108 Kb [20]. A library with a 
larger insert size offers an advantage in reducing the 
number of clones required for adequate coverage of the 
genome. This will also simplify screening the generation 
of BAC contigs as demonstrated in this study and physical 
mapping. 

PPO copy number 

Numbers of BAC clones in the library containing PPOl, 
PP02, PP04 and PP05 varied from four to > 28 (Table 1). 
Between five and six copies of PP02, PP04 and PP05 
were detected in the library, suggesting that these genes 
are present as single copies in the red clover genome. Both 
PPO 3 and PPO 5 were detected in genotypes of three red 
clover cultivars, suggesting separate genes. The high 
homology of their 3' flanking sequences may indicate a 
duplication event. However, PPO 3 was not identified in 
the BAC library. This may have resulted from an uneven 
distribution of restriction enzyme recognition sites 
throughout the genome [21]. Regions with low numbers 
of restriction sites may be under-represented, while 
regions with higher number of restriction sites may create 
fragments smaller than the cut off fragment size, which in 
our case was <90 Kb. 

By contrast, a minimum of 28 potential BAC clones con- 
taining PPOl were identified in the library, indicating 
multiple copies. Sequencing indicated three PPOl vari- 
ants: PPOl/2, PPOl/4 and PPOl/5, (Figure 3). PPOl/2 
was detected in four BAC clones indicating a single copy 
in the genome, whilst PPO 1/4 was detected in at least 26 
BAC clones suggesting either multiple copies or an over- 
representation of this gene in the BAC library. The latter is 
most likely since BAC ends of both BAC clones that con- 
tain PPO 1/4 alone map onto BAC 212 G7, indicating that 
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1 100 

atgatactaaccaaaatagccctaaagaacaagaacaaaaagcatcacctagaagaaatgttctaataggtctaggaggactttatggtgctaccacttt 
atgatactaaccaaaatagicctaaagaacaagaacaaaaa|catcaccaagaagaaatgttctaataggtctaggaggactttatggtgctaccacttt 
atgatactaaccaaaatagigctaaagatcaagaacaaaaa|catcacctagaagaaatgttctaataggtctaggaggactttatggtgctaccacttt 

101 200 
CACAAACAACAACTCACTAGCCTTTGGTGCTCCAGTGCCAATTCCAGATCTCACCTCATGTGTAGTTCCACCAATAGAATTACCAGATGATATAAAAAAA 
TACAAACAACAACTCACTAGCCTTTGGTGCTCCAGTGCCAATTCCAGATCTCACCTCATGTGTAGTTCCACCAATAGAGTTACCAGATGATATGAAAATA 
CACAAACAACAACTCACTAGCCTTTGGTGCTCCAGTGCCAATTCCAGATCTCACCTCATGTGTAGTTCCACCAATAGAATTACCAGATGATATAAAAAAA 
201 300 
ATAGACCCTCCAATCAGTTGTTGTCCACCATTTTCCTCAGATATCATAGATTTTAAGTTCCCTACTTTTAACAAATTAAGGGTAAGACCAGCTGCACAAT 
ATAAACCCTCCACTCAGTTGTTGTCCACCATTTTCCTCAGACATCATAGATTTTAAGTTCCCTACTTTTAAAAAATTAAGGGTAAGACCAGCTGCACAAT 
ATAGACCCTCCAATCAGTTGTTGTCCACCATTTTCCTCAGACATCATAGATTTTAAGTTCCCTACTTTTAACAAATTAAGGGTAAGACCAGCTGCACAAT 
301 400 
TAGTTAATGATGATTATTTTGCAAAATACAATAAAGCCCTTGAACTCATGAGAGCCCTACCAGATGATGATCCAAGAAGTTTTTACCAACAAGCTAACAT 
TAGTTAATGATGATTATTTTGCAAAATACAATAAAGCCCTTGAACTCATGAGAGCCCTACCAAATGATGATCCAAGAAGTTTTTACCAACAAGCTAACAT 
TAGTTAATGATGATTATTTTGCAAAATACAATAAAGCCCTTGAACTCATGAGAGCCCTACCAGATGATGATCCAAGAAGTTTTTACCAACAAGCTAACAT 
401 500 

tcattgtgcttattgtgttggtggttatacacaaaaaggttacgacgttgaactacaagttcataattcttggctatttttgcctttccatcgttggtat 
tcattgtgcttattgtgttggtggttatacacaaaaaggttacgacgctgaactacaagttcataattcttggctatttttgcctttccatcgttggtat 
tcattgtgcttattgtgttggtggttatacacaaaaaggttacga|gttgaactacaagttcataattcttggctatttttgcctttccatcgttggtat 

501 600 
CTTTATTTTTATGAGAGAATCTTAGGTAGTTTAATCAATGACCCTACTTTTGCCATACCCTTTTGGAATTGGGATGCTCCTGATGGCATGCAAATTCCTT 
CTTTATTTTTATGAGAGAATCTTAGGTAGTTTAATCAATGACCCTACTTTTGCCATACCCTTTTGGAATTGGGATGCTCCTGATGGCATGCAAATTCCTT 
CTTTATTTTTATGAGAGAATCTTAGGTAGTTTAATCAATGACCCTACTTTTGCCATACCCTTTTGGAATTGGGATGCTCCTGATGGCATGCAAATTCCTT 
601 700 
CCATTTTTACAAATCCAAATTCTTCCCTTTATGACCCTAGAAGAAATCCCACACATCAACCACCAACAATCGTTGACCTAAACTATAACAGAAAAAATGA 
CCATTTTTACAAATCCAAATTCTTCCCTTTATGACCCTAGAAGAAATCCCf CACATCAACCACCAACAATCGTTGACCTAAACTATAACAAAGCTAATGA 
CCATTTTTACAAATCCAAATTCTTCCCTTTATGACCCTAGAAGAAATCCCfCACATCAACCACCAACAATCGTTGACCTAAACTATAACAAAGCTAATGA 
701 800 
TAACCCTGCTACTAATCCAAGTGCAGAAGAACAAATCAAAATCAACCTTACTTGGATGCATAAACAAATGATCTCCAACAGCAAGACCAATAGACAATTT 
TAACCCTGCTACTAATCCAAGTGCAGAAGAACAAATCAAAATCAACCTTACTTGGATGCATAAACAAATGATCTCCAACAGCAAGACCAATAGACAATTT 
TAACCCTGCTACTAATCCAAGTGCAGAAGAACAAATCAAAATCAACCTTACTTGGATGCATAAACAAATGATCTCCAACAGCAAGACCCCTAGACAATTT 
801 900 
CTTGGAAGCCCTTATCGCGGCGGTGACACACCTTTCAAAGGTGCCGGCTCATTAGAAAATATTCCACATACACCTATTCATATATGGACCGGTGATCCAA 
CTTGGAAGCCCTTATCGCGGCGGTGACACACCTTTCAAAGGTGCCGGCTCATTAGAAAATATTCCACATACACCTATTCATATATGGACCGGTGATCCAA 
CTTGGAAGCCCTTATCGCGGCGGTGACACACCTTTCAAAGGTGCCGGCTCATTAGAAAATATTCCACATACACCTATTCATATATGGACCGGTGATCCAA 
901 1000 

gacagcctcatggagaggacatgggacatttcta|gccgcaggaagagatccacttttttacgctcaccatgcaaatgtggataggatgtggtctgtttg 
gacagcctcatggagaggacatgggacatttcta|gccgccggaagagatccacttttttacgctcaccatgcaaatgtggataggatgtggtctgtttg 
gacagcctcatggagaggacatgggacatttct|ggccgccggaagagatccacttttttacgctcaccatgcaaatgtggataggatgtggtctgtttg 

1001 1100 
GAAAACGTTAGGTAAAAAAAGGAAGGATTTCACTGACCCGGATTGGCTAGAGTCTGAATTTCTCTTTTATGATGAGAATAAGAATCTTGTTAAAGTGAAA 
GAAAACGTTAGGTAAAAAAAGGAAGGATTTCACTGACCCGGATTGGCTAGAGTCTGAATTTCTCTTTTATGATGAGAATAAGAATCTTGTTAAAGTGAAA 
GAAAACGTTAGGTAAAAAAAGGAAGGATTTCACTGACCCGGATTGGCTAGAGTCTGAATTTCTCTTTTATGATGAGAATAAGAATCTTGTTAAAGTGAAA 
1101 1200 
GTCAAGGATAGTGCTAATGATAGAAAGCTTGGTTATGTTTATCAAGATGTTGACATTCCTTGGATAAAATATAAATCTAAACCTAGTAGGAGAGTTAAGT 
GTCAAGGATAGTGCTAATGATAGAAAGCTTGGTTATGTTTATCAAGATGTTGACATTCCTTGGATAAAATATAAATCTAAACCTAGTAGGAGAGTTAAGT 
GTCAAGGATAGTGCTAATGATAGAAAGCTTGGTTATGTTTATCAAGATGTTGACATTCCTTGGATAAAATATAAATCTAAACCTAGTAGGAGAGTTAAGT 
1201 1300 

CTAAGGATAAGAATAAGTCAAC CAGAAAATTGGTTGATAAGTTTCCTATTGTTTTGGATTCGGTTGTGAGTATCATCGTGAAGAG 

CTAAGGATAAGAATAAGTCATCAGCACAACGCCCTTCCAAAAAATTGGTTGATAAGTTTCCTATTGTTTTGGATTCGATTGTGAGTATCATCGTGAAGAG 

CTAAGGATAAGAATAAGTCATCAGCACAACGCCCTTCCAGAAAATTGGTTGATAAGTTTCCTATTGTTTTGGATTCGGTTGTGAGTATCATCGTGAAGAG 

1301 1400 

GCCAAAGAAGTCGAGGAATTCCAAGGAGAAGGAAGATGAAGAGGAGATTTTGGTGATTGATGGGATCGAGTATGATAACAAAACTGAAGTGAAGTTTGAT 

GCCAAAGAAGTCGAGGAATTCCAAGGAGAAGGAAGATGAAGAGGAGATTTTGGTGATTGATGGGATCGAGTATGATAACAAAACTGAAGTGAAGTTTGAT 

GCCAAAGAAGTCGAGGAATTCCAAGGAGAAGGAAGATGAAGAGGAGATTTTGGTGATTGATGGGATCGAGTATGATAACAAAACTGAAGTGAAGTTTGAT 

1401 1428 

GTTATTGTGAATGATGAAGATGATAAGG 

GTTATTGTGAATGATGAAGATGATAAGG 

GTTATTGTGAATGATGAAGATGATAAGG 



Figure 3 

DNA sequence alignment of three variants of PPOI gene isolated from T. pratense. PPOI/2 is a partial sequence; 
PPOI/4 is complete coding region [GenBank: FJ5872 1 4 ]: PPOI/5 is complete coding region and most similar to published PPOI 
[GenBank: AY0 1 7302 ]. The figure was generated in Vector NTI and formatted in word. 



the solitary PPO 1 /4 gene actually resides within the PPO 
cluster. PPO 1/5 was detected in a total of nine BAC 
clones, representing one or possibly two predicted copies. 
Four PPO 1/5 genes were sequenced; while three were 
identical, the fourth had near identical homology in both 
gene and flanking sequences and a 100 bp out of 1.7 Kb 
deletion in the 3' flanking region, suggesting allelic varia- 
tion. 



PPO family of genes and genome structure 

The results presented in this manuscript indicate that 
there are five distinct paralogous genes in the red clover 
multigene PPO family: PP01-PP05. The BAC library has 
yielded full length gene sequences and upstream regula- 
tory regions for two new PPO genes, PP04 and PP05, and 
for two variants of PPOI, PPOl/5 and PPOl/4. There 
were no introns identified in the newly identified red clo- 
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Figure 6 

Schematic representation of PPO gene cluster in rice 
taken from rice chromosome 4 [GenBank: 

AP0082I0.I 3 1 75477 1 -3 1 786730]. PPO I : [Gen- 
Bank: AKI08237.l ] (DNA), [PDB: CAE035 10.2] (amino acid); 
PP02: [PDB: CAH6680I.I ] (amino acid). 



Figure 4 

Red clover PPO identities at the cDNA and amino 
acid levels. 



ver PPO genes and variants. This was in agreement with 
results reported previously for PPO in other dicotyledo- 
nous species, including hybrid poplar [22], potato [23], 
tomato [17] and red clover [15], and as predicted from M. 
truncatula genomic sequences [GenBank: AC157507.2 ], 
but is in contrast to PPO genes identified in monocotyle- 
donous species, such as pineapple [24], wheat [GenBank: 
EF070147 to GenBank: EF070150 [25]]. rice [GenBank: 
AP008210 ], Lolium perenne [GenBank: FT587212 ] and Fes- 
tuca pratense [GenBank: FT587213 ]. 

The occurrence of multiple PPOs on single BAC clones 
and the putative alignment of four BAC clones with six 
distinct PPO genes on an estimated 190-510 Kb fragment 
is strong evidence for a PPO gene cluster in T. pratense 
(Figure 5). The order and presence of three PPO genes 
were confirmed by sequencing a 156,267 bp BAC clone, 
212 G7. Similar PPO clusters were previously reported in 
tomato [16] where seven genes were reported as clustered 
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Figure 5 

Diagram of cluster of 6 PPO genes detected on four 
separate BAC clones. The four BAC clones have been 
aligned based on detection of specific PPO genes by PCR; the 
cluster is estimated to span a maximum of 5 10 Kb. 



over 165 Kb and detected both in M. truncatula, where 
there are two PPO genes present in 8 Kb of sequence [Gen- 
Bank: AC157507.2 ] and in rice, where two active PPO 
genes and a redundant PPO pseudogene (Figure 6; [Gen- 
Bank: AP008210.1 ]) are present in 30 Kb of sequence; rice 
PP02 also contains a 11.3 Kb retrotransposon-like insert 
exhibiting 94% homology with a gypsy-type retrotranspo- 
son in rice [GenBank: AB030283 ] [26]. Retrotransposon 
insertion into the maize waxy gene does not appear to 
have impaired protein coding ability [27]. 

No other genes were identified in the vicinity of the red 
clover PPO cluster, although retrotransposons and 
regions of homology with M. truncatula and Lotus japoni- 
cus genomic sequence were found on the sequenced BAC 
212 G7. Retrotransposons are implicated in gene duplica- 
tion, altering patterns of gene expression and generating 
new functions in legumes and maize [27-29]. 

Clustering of duplicated genes is a well-established phe- 
nomenon in plants. This could influence gene function 
and facilitate co-ordinated expression, and, in duplicated 
genes, such as PPO genes, minor changes in position may 
allow subtle changes in regulation, which may benefit the 
plant under new selection regimes by creating novel tis- 
sue-specific or environmentally induced expression. 

Evolutionary implications 

Gene clustering and the occurrence of paralogous 
sequences in the PPO gene family can hint at underlying 
gene evolution and function mechanisms. For example, 
paralogous genes are widely recognised and expected to 
have diverged by a minimum of 10% over time [30]. Four 
of the five PPO genes have diverged by 10% or more at the 
cDNA or amino acid levels (Figure 4), whereas PPO 3 [15] 
and the newly sequenced PPO 5 share 94% identity. This 
is substantially higher than the 80-90% identity expected 
for ancient paralogues. Nearly identical paralogues (NIPs) 
have been defined as paralogous genes that exhibit > 98% 
identity [30]. Such NIPS are claimed to allow differential 
expression within the gene family and increase plasticity 
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Figure 7 

Phylogenetic tree of coding DNA sequences of 
selected PPO genes and gene families. DNA sequences 
of all selected plant species were aligned with the shortest 
available PPO sequence (PPOI/2 at I4I3 bp). These 
sequences included the conserved tyrosinase domain. Ln 
Likelihood = -222 1 3.8963; p < 0.0 1. Species names and PPO 
annotation were abbreviated for convenience. Lycopersicon 
esculentum Le PPOA/A' [GenBank:Z12833], Le PPOB [Gen- 
Bank: ZI2834] . Le PPOC [GenBank:ZI2835], Le PPOD 
[GenBank:Zi2836], Le PPOE [GenBank:Z1283Z, Le PPOF 
[GenBank:ZJ_2838]; Medicago sativa Ms PPO [Gen- 
Ban k:AY283062]; M. truncatula MtPPOl and Mt PP02 [Gen- 
Ban k:AC_l5Z50L2]; Nicotiana tabacum Nt PPO 
[GenBank: YI250l] : Solonum tuberosum St PP032 [Gen- 
Bank: U2292l] . St PP033 [GenBank:U22922]; Trifolium pret- 
ense Tp PP02 [GenBank: AYOI7303 ]. Tp PP03 
[GenBank: AYOI7304] . Tp PP04 [GenBank: EFI83483.l ]. Tp 
PP05 [GenBank: EFI83484.l ]. 



of the transcriptome [30]. In red clover, variants of PPO 1 
may be considered as NIPs: PPO 1/2, PPO 1/5 and PPOl/ 
4 exhibit more than 98% identity. 

The different PPOs, including the three NIPs of PPOl, 
have presumably arisen due to partial genome duplica- 
tion, the extent of divergence relating to the timing of the 
duplication event(s). PPO gene sequences vary considera- 
bly, forming clear phylogenetic groups for higher plants, 
vertebrates, fungi and bacteria [31]. DNA sequences show 
high homology within species and within families, such 
as Solanaceae [Solarium, Lycopersicon and Nicotiana spe- 
cies) and Fabaceae (Vicia, Trifolium and Medicago species) 
(Figure 7). 



The divergence of PPO genes within red clover is similar 
to that observed within other plant species. For example, 
the two PPO genes identified in M. truncatula have 90% 
identity [GenBank: AC157507.2 ] whereas the seven clus- 
tered genes [GenBank: Z12833. Z12834. Z12835. 
Z12836 . Z12837 . Z12838 ] in the tomato PPO family have 
between 73 and 97% identity [17]. 

Red clover possesses a large, functional PPO gene family 
(Figure 7). While PPO enzymes are expressed constitu- 
tively in aerial and root tissues of T. pratense, PPO 
enzymes only exist in a latent or inactive form in leaf tis- 
sue of both T. repens (unpublished data) and V. faba [14]. 
By contrast, PPO activity is not detected in other agro- 
nomically important forage legumes, such as Medicago 
sativa (alfalfa) and Lotus corniculatus (birdsfoot trefoil), or 
in the model species M. truncatula and L. japonicus 
(unpublished data). At least one PPO gene is present in M. 
sativa, and two in M. truncatula yet, to date, no ESTs have 
been reported for either species. It is possible that condi- 
tions have not yet been determined that elicit PPO gene 
expression in these species, but the apparent lack of PPO 
transcript concurs with failure to detect PPO enzyme 
activity in tissues of either species. 

These observations raise questions about the evolution of 
PPO genes both within T. pratense and between T. pratense 
and its close relatives. Phylogenetic trees of divergence of 
T. pratense PPO DNA sequences (Figure 7) confirm the 
level of identity of red clover PPO at the genetic level, with 
PPOl NIPs being most similar and probably, therefore, 
most recently diverged [22,32]. 

Diversification of plant genomes is powered in part by 
gene duplication, which can result in new gene functions 
[33]. Such gene duplication may occur by creation of 
polyploids, by segmental duplication or duplication in 
tandem arrays resulting in the production of gene clusters. 
Positive selection is believed to play a crucial role in the 
retention of such duplicated genes [33] but the effect of 
positive selection on tandem arrays or clusters of genes is 
not clear [18]. Over time, individual PPO genes and PPO 
clusters may have originated, duplicated and subse- 
quently been lost, their function governed by mutations 
in regulatory elements. A comparison of selected PPO 
DNA sequences in both red clover and tomato (Figure 7), 
indicates that such gene duplication has occurred leading 
to clusters of six or seven similar PPO genes, each with 
known, different expression patterns. 

PPO localisation and function 

The biological effects of PPO appear to be subtle, possibly 
requiring specific or even multiple triggers for expression 
in vivo. Enhanced localised PPO expression under biotic 
and abiotic stress provides evidence of its involvement in 
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plant protection in various species, for example, localised 
PPO expression in leaf abscission zones during drought 
[34]. A multiple regulatory trigger might explain the pre- 
requisite of plant hardening, by low temperature and low 
light mimicking autumn conditions, before any difference 
in susceptibility to Sclerotinia trifoliorum was detected 
between late and medium-late flowering types of red clo- 
ver [35]. Similarly, differences in survival of low PPO- 
mutant and wild-type red clover plants only became 
apparent under multiple, natural infestations [8]. 

The high degree of homology in active sites of red clover 
PPO indicates similar enzymic properties. However, dif- 
ferences do occur in localisation of PPO enzyme activity 
and specific PPO gene expression in both red clover [15] 
and tomato [7, 1 6, 1 7], suggesting significant differences in 
their regulatory elements; this is supported by observed 
differences in sequenced promoter regions of four red clo- 
ver PPO genes. Red clover PPO genes are differentially 
expressed in aerial tissues and root tissues [15] conferring 
the potential for enhanced or localised expression follow- 
ing differing abiotic and biotic stimuli. 

Conclusion 

The red clover BAC library has yielded novel full-length 
gene sequences of PP04, PP05 and the PPOl NIP PPOl/ 
4, which will be used in functional studies involving tech- 
niques such as RNAi, and the PPO promoter sequences 
will be used for localisation studies using pro- 
moter: reporter gene fusions. It has also revealed recent 
gene duplication events in the form of NIPs and evidence 
of gene clustering. The BAC library will provide a useful 
tool for the map-based cloning of target QTL, physical 
mapping, genome structure analyses and the alignment of 
specific regions of the T. pratense genome with its close rel- 
atives the model legume, M. truncatula, and other legume 
species such as alfalfa, revealing any genomic changes or 
divergence at these sites. The high degree of synteny 
between T. pratense and T. repens with both M. truncatula 
and M. sativa [20,36] will allow comparative mapping 
between model and agronomically important legumes. 

Methods 

Construction of the red clover BAC library was based on 
procedures described previously [37]. 

Isolation of high molecular weight genomic DNA 

High molecular weight (HMW) DNA was isolated from a 
single genotype of diploid T. pratense cultivar Milvus (2n 
= 2x = 14). The plants were maintained in darkness for 42 
h prior to harvesting a total of 21.9 g leaf tissue. The leaf 
tissue was frozen and stored at -80 °C. Leaf tissue was 
ground in liquid nitrogen and nuclei isolated [38]. The 
nuclei were embedded in agarose plugs and, before diges- 
tion, the HMW DNA was subjected to a pre-electrophore- 



sis step on a 1% (w/v) agarose (Sigma, St Louis, MO, USA) 
gel using a CHEF-DR II PFGE apparatus (Bio-Rad, Her- 
cules, CA, USA) [39,40]. 

Partial digestion and size selection of digested DNA 

The entire library was generated from a single size selec- 
tion experiment. T. pratense DNA was partially digested 
using HindlU (Roche, Mannheim, Germany) and sepa- 
rated in a single step, on a 1% (w/v) pulse field certified 
agarose gel, by PFGE at 5.2 V cm 1 for 16 h with a linear 
pulse ramp from 0.5-40 s using a CHEF-DR II apparatus 
(BioRad). Partial digestion was performed using a low 
enzyme concentration (0.5 U/plug) at 37 °C for 1 h, 
which in preliminary studies resulted in a smear of DNA 
between 160 Kb and 90 Kb but no significant DNA below 
this on the gel. 

Following electrophoresis, the flanking regions of the gel 
containing HMW DNA ladder (lambda ladder PFG 
marker; NEB, Beverly, MA) were stained with ethidium 
bromide and marked under UV so that alignment with the 
unstained gel allowed the selection of one gel slice in the 
range of 100-150 Kb. This gel slice was then excised and 
the partially digested genomic DNA recovered by dialysis 
[39]. 

Ligation and transformation 

The partially digested DNA was ligated with HindlU- 
digested pIndigoBAC-5 vector (Epicentre Biotechnolo- 
gies, Madison, WI, USA) using a predicted vector/insert 
molar ratio of between 5:1 and 10:1. Ligations were car- 
ried out in lx T4 DNA ligase buffer at 14 °C overnight 
using 1 Weiss unit of T4 DNA ligase (Roche) per 50 ul of 
ligation buffer. The ligation reaction was drop dialysed 
and 1 ul of the ligation product was transformed into 20 
ul of Escherichia coli ElectroMAX DH10B competent cells 
(Invitrogen, Carlsbad, CA, USA) by electroporation 
(GenePulser II; Bio-Rad). Transformed cells were allowed 
to recover in 1 ml SOC media (2% w/v bacto tryptone, 
0.5% w/v bacto yeast extract, 10 mM NaCl, 2.5 mM KC1, 
10 mM MgCl 2 , 20 mM glucose, pH 7.0) at 37 °C for 45 
min with shaking at 180 rpm, and plated out on LB plates 
containing 12.5 \ig ml 1 of chloramphenicol and incu- 
bated at 37°C overnight [37,41]. 

Picking and storing 

BAC colonies were picked in duplicate into 200 ul of 
Freezing Broth (LB, 36 mM K 2 HP0 4 , 13.2 mM KH 2 P0 4 , 
1.7 mM Na Citrate, 0.4 mM MgS0 4 , 6.8 mM (NH 4 ) 2 S0 4 , 
4.4% v/v Glycerol, 12.5 \ig ml 1 chloramphenicol) in 96- 
well microtitre plates using a GloPix robot (Genetix, New 
Milton, Hampshire, UK). Following overnight incubation 
at 37°C, plates were stored at -80°C. A total of 26,016 
BAC clones were picked into 271 96-well plates. 
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Determination of insert size of BAC clones 

A total of 58 BAC clones chosen at random were selected, 
cultured overnight and insert size determined. Following 
Notl restriction digestion, isolated DNA was separated by 
PFGE in the presence of molecular weight markers in 
order to estimate the average insert size of the cloned 
DNA. 

Pooling of BAC library for PCR-based screening 

The library was replicated in microtitre plates and plate 
cultures pooled in such a way as to enable a PCR-based 
screen of the library [37]. A total of 271 microtitre plates 
of clones were used as the basis for the screen. Each plate 
was represented in three superpools so that, following 
DNA extraction, a PCR screen of 147 DNA superpools 
would generate three positive amplifications per positive 
BAC colony. Once the superpools had been created, 50 ml 
plastic tubes containing the pooled cultures from up to 
seven plates were centrifuged at 5000 rpm in a model 
5403 centrifuge (Eppendorf, Hamburg, Germany). The 
supernatants were discarded and the pellets frozen at - 
80 °C. BAC DNA was isolated from the stored pellets 
using an alkaline lysis method, which included RNase in 
the resuspension buffer. Superpool DNA was precipitated 
using isopropanol, the pellet washed with 70% ethanol 
and resuspended in TE. 

PCR-based screen of the BAC library 

The DNA superpools of the BAC library were screened 
using PCR primers for amplification of individual genes. 
PCR primers were designed from sequences of five T. prep- 
ense PPO genes: PPOl [GenBank: AY017302.1 ], PP02 
[GenBank: AY017303.1 ] and PP03 [GenBank: 
AY017304.1 ] and from partial PP04 and PP05 sequences 



identified in this study. Following the initial BAC library 
screen, PCR primer pairs were also designed for PPO 1 var- 
iants PPO 1/2, PPO 1/4 and PPO 1/5 (Table 2). 

Isolation and identification of genes in PPO family 

PPO fragments were generated by PCR from red clover 
genomic DNA (cultivar Milvus) with degenerate primers 
based on regions of homology to PPO genes from T. prat- 
ense and Viciafaba (PPO deg; Table 2). PCR amplification 
products were visualised on an agarose gel, excised, puri- 
fied and cloned into E. coli (Invitrogen Topo TA Cloning® 
kit with pCR®2.1 TOPO® vector and TOP 10 One Shot® 
Cells). Inserts were sequenced and a number of PPO genes 
were detected, including two novel genes designated 
PP04 and PP05. PP04 and PP05 were isolated from the 
BAC library using specific primers (Table 2) designed to 
specifically amplify individual genes. Three variants of 
PPOl (PPO 1/2, PPO 1/4 and PPO 1/5) were also 
sequenced These PPOl variants were designated codes 
according to their juxtaposition with other PPO genes on 
BAC clones: PPO 1/2, PPO 1/4 and PPO 1/5 were initially 
detected on BAC clones along with PP02, PP04 or PP05, 
respectively. Once identified, selected PCR-positive BACs 
for each gene were sequenced directly using specific prim- 
ers (Table 2) and an ABI prism 3100 DNA analyser 
(Applied Biosystems, Warrington, UK). BAC walking was 
used to generate full length gene and upstream promoter 
sequences of PPO genes. 

Sequencing and in-silico analysis 

Sequencing of PCR products and BAC clone plasmids was 
carried out using an ABI-3100 Genetic Analyser (Applied 
Biosystems) using fluorescent dye terminators. A BAC 
clone (designated 212 G7) harbouring genes PPO 1/5, 



Table 2: PCR product size and PCR primer pairs used to amplify PPO genes 



Gene 


PCR product 
(bp) 


Forward 


Reverse 


PPOl 


401 


CGGCGGTGACACACCTTTC 


GGAAGGGCGTTGTGCTGATG 


PP02 


229 


C AAC AAG AAG AAG G AG AAG AAG 


AGCACCACCACGAGAAGAAT 


PP04 


150 


ACGAAGGTGGCGTAGATGAC 


CATTTCCATGGTGAGCGTAA 


PP05 


391 


GCAAATCTAAGGAGGATCCTACCG 


AGTCTCTAGCCAATCATCGTC 


PPO 1/5 


200 


G G AATGTC AAAATT AGTG G C 


ACATTGATTACAATATATTCC 


PPO 1/2 


140 


ATCGTTGACCTAAACTATAACAG 


GAAAGGTGTGTCACCGCC 


PPO 1/4 


200 


ATAGAAAACCAAAGCACC 


Al 1 1 1 CATATCATCTGGTAAC 


PPO deg 


764-771 


GCCMYTRAWCTCATGARAGC 


CTCATCATARAARAGAAA 



PPO deg - PPO degenerate primers 
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PP05 and PPOl/4 was fully sequenced on a Roche 454 
GS-FLX™ system, giving an average of 30,000 reads or 6 
Mb of data (Cogenics). 

Sequences were assembled and further analysed using 
Vector NTI software and NCBI/BLAST and FASTA pro- 
grams. Sequences were compared to public DNA, EST and 
protein (NCBI) databases and existing red clover PPO 
gene sequences to confirm their identity. 

PPO DNA sequences were aligned in Vector NTI Advance 
10, based on ClustalW algorithm, and displayed in 
PHYLIP 3.67. For valid comparisons, DNA sequences of 
all selected plant species were aligned with the shortest 
available PPO sequence (PPO 1/2; 1413 bp) and truncated 
in line with this sequence: the truncated sequences con- 
tain the conserved domain. DNA sequence data were ana- 
lysed statistically by Maximum Likelihood Method and 
the phylogeny tree was generated using PHYLIP http:// 
evolution . genetics .Washington . edu/phylip . html [ 42 ] . 

Accession numbers of new red clover and grass PPO 
sequences 

Identified PPO genes sequences were submitted to Gen- 
Bank: Trifolium pratense PP04 [GenBank: EF183483.1 ], 
PP05 [GenBank: EF183484.1 ], PPOl/4 [GenBank: 
FT587214 ]; Lolium perenne [GenBank: FT587212 ]; Festuca 
pratense [GenBank: FT587213 ]. 
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